{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Output Streaming\n",
    "\n",
    "<div class=\"subtitle\">Stream Query Progress In Real Time</div>\n",
    "\n",
    "LMQL supports many forms of communicating query progress and results to the surrounding context, including the ability to stream intermediate values to the caller or a client connected via HTTP. \n",
    "\n",
    "This chapter first discusses the standard output writers, supported out of the box, and then discusses how to create you custom output writers, to implement more advanced streaming scenarios."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# setup lmql path (not shown in documentation, metadata has nbshpinx: hidden)\n",
    "import sys \n",
    "sys.path.append(\"../../../src/\")\n",
    "# load and set OPENAI_API_KEY\n",
    "import os \n",
    "os.environ[\"OPENAI_API_KEY\"] = open(\"../../../api.env\").read().split(\"\\n\")[1].split(\": \")[1].strip()\n",
    "\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "# disable logit bias logging\n",
    "import lmql.runtime.bopenai.batched_openai as batched_openai\n",
    "batched_openai.set_logit_bias_logging(False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#notebooks.js:hidden\n",
    "import sys\n",
    "sys.path.append(\"../../../../lmql/src/\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Standard Output Writers"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To simply print the current query output to the standard output, you can use the `lmql.printing` output writer. This will show query progress during execution, as well as intermediate validation results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Q: Hello\n",
      "A: Hi there! How can I assist you?\n",
      "\n",
      " valid=True, final=fin\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[LMQLResult(prompt='Q: Hello\\n A: Hi there! How can I assist you?', variables={'WHAT': ' Hi there! How can I assist you?'}, distribution_variable=None, distribution_values=None)]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#notebooks.js:show_result=false\n",
    "await lmql.run(\"'Q: Hello\\\\n A:[WHAT]'\", \n",
    "               output_writer=lmql.printing)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, if you want to only stream the result for a specific variable, you can use the `lmql.stream(\"VAR\")` output writer. This will only print the result for the variable `VAR`, as it is generated by the query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello! How can I assist you today?"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[LMQLResult(prompt='<lmql:user/> Hello\\n <lmql:assistant/> Hello! How can I assist you today?', variables={'RESPONSE': ' Hello! How can I assist you today?'}, distribution_variable=None, distribution_values=None)]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#notebooks.js:show_result=false\n",
    "await lmql.run(\"'{:user} Hello\\\\n {:assistant}[RESPONSE]'\", \n",
    "               model=\"chatgpt\",\n",
    "               output_writer=lmql.stream(\"RESPONSE\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lastly, there are also the options `lmql.headless` and `lmql.silent` to disable all input and output, and to disable all output, respectively. The difference between the two is that `headless` will raise an exception if the query asks for user input (via `input()`), while `silent` will ask for user input, but not print anything to the standard output."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Custom Output Writer"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next to the standard output writers, you can also provide your own implementation and pass it via `output_writer=` when running a query. \n",
    "\n",
    "The basic interface for an output writer is as follows:\n",
    "\n",
    "```python\n",
    "class BaseOutputWriter:\n",
    "    async def input(self, *args):\n",
    "        \"\"\"\n",
    "        Handle user input with an input prompt of *args. This is invoked when a query asks for user input via `await input()`.\n",
    "\n",
    "        Returns:\n",
    "            str: The user input.\n",
    "        \"\"\"\n",
    "\n",
    "    async def add_interpreter_head_state(self, variable, head, prompt, where, trace, is_valid, is_final, mask, num_tokens, program_variables): \n",
    "        \"\"\"\n",
    "        Called whenever the query interpreter progresses in a meaningful way (e.g. new token added, new variable added, variable updated, etc.).\n",
    "\n",
    "        Parameters:\n",
    "            variable (str): \n",
    "                The name of the currently active variable.\n",
    "            head (int): \n",
    "                The index of the current interpretation head (deprecated, will always be 0).\n",
    "            prompt (str): \n",
    "                The full interaction trace/prompt of the query.\n",
    "            where (object): \n",
    "                The AST representation of the queries validation condition.\n",
    "            trace (object): \n",
    "                The evaluation trace of evaluating 'where' on the current program variables during generation.\n",
    "            is_valid (bool): \n",
    "                Whether the current program variables satisfy the validation condition.\n",
    "            is_final (bool): \n",
    "                Whether the value of 'valid' can be considered final (i.e. decoding more tokens will not change the value of 'valid').\n",
    "            mask (np.ndarray): \n",
    "                Currently active token mask.\n",
    "            num_tokens (int): \n",
    "                Number of tokens in the current 'prompt'.\n",
    "            program_variables (ProgramState): \n",
    "                The current program state (lmql.runtime.program_state). E.g. program_variables.variable_values is a mapping of variable names to their current values.\n",
    "        \"\"\"\n",
    "```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Based on this interface, you can implement your own output writer to implement custom streaming. For examples of how this interface can be used, see the implementation of the standard output writers in `lmql.runtime.output_writer`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "lmql",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
